Method and apparatus for implementing virtual performance partner

ABSTRACT

A method and apparatus for implementing a virtual performance partner are provided. The method includes collecting audio frame data performed by a performer; and for each piece of current audio frame data collected, performing: converting the piece of current audio frame data collected into a current digital score, matching the current digital score with a range of digital scores in a repertoire, and determining a matching digital score in the range of digital scores that matches the current digital score; positioning a position of the matching digital score in the repertoire, and determining a start time of playing a cooperation part of music in a next bar of the matching digital score in the repertoire for a performance partner.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a bypass continuation application of InternationalApplication No. PCT/KR2023/002880, filed on Mar. 2, 2023, which based onand claims priority to Chinese Patent Application No. 202210329134.2,filed on Mar. 30, 2022, in the China National Intellectual PropertyAdministration, the disclosures of which are incorporated by referenceherein in their entireties.

TECHNICAL FIELD

The disclosure relates to an audio processing technology, and moreparticularly, to a method and apparatus for implementing a virtualperformance partner.

BACKGROUND ART

As quality-oriented education in China has increased, enthusiasm formusic has increased, and an important part of music skills are musicalinstrument performances. In an ordinary musical instrument performance,a performer and a cooperator may cooperatively perform a piece of music.The performer plays a dominant role in the musical cooperation. Thecooperator should follow the performance of the performer. For example,in the performance of violin, a performer plays the violin, and asymphony orchestra as a cooperator performs with the violin performer.

In traditional musical instrument performance exercises, the performerusually follows a recording, e.g., a compact disc (CD). However, sincethe performer has a limited level or the performed music may be ahigh-difficulty virtuoso composition, the performer is likely unable tokeep up with the recording of an artist on the CD, so the experience ofthe current performance exercise is poor.

SUMMARY

Provided are a method and apparatus for implementing a virtualperformance partner, which enables music playing to be adapted to theperformance progress of performers and improves the performanceexperience of performers.

In accordance with an aspect of the disclosure, a method for providing avirtual performance partner includes: collecting audio frame dataperformed by a performer; and for each piece of current audio frame datacollected, performing: converting the piece of current audio frame datacollected into a current digital score, matching the current digitalscore with a range of digital scores in a repertoire, and determining amatching digital score in the range of digital scores that matches thecurrent digital score; positioning a position of the matching digitalscore in the repertoire, and determining a start time of playing acooperation part of music in a next bar of the matching digital score inthe repertoire for a performance partner; and determining a performanceerror between the performer and the performance partner based on aperformance time of the current digital score and a performance time ofthe matching digital score, and adjusting a playing speed of theperformance partner for the cooperation part in the repertoire based onthe performance error.

The method may include determining, based on a position of the currentdigital score, the range within which a next audio frame is matched.

The determining the start time of playing the cooperation part of musicin the next bar of the matching digital score in the repertoire for theperformance partner may include: determining a performance speed of theperformer based on the position of the matching digital score andpositions of matching digital scores corresponding to first N pieces ofaudio frame data in the current audio frame data, and identifying theperformance speed as a reference playing speed of the repertoire; anddetermining a start time of playing a next bar of music of the matchingdigital score in the repertoire for the performance partner based on thereference playing speed.

The adjusting the playing speed of the performance partner for therepertoire based on the performance error may include: based on theperformance error being less than one beat, adjusting the playing speedof the performance partner within a current bar of music based on theperformance error based on the reference playing speed, to make theperformance partner consistent with the performer in a performance endtime of the current bar of music; and based on the performance errorbeing greater than one beat, pausing playing, by the performancepartner, at the current bar, and playing a next bar of music based on aplaying time of the next bar of music.

The method may include, based on repeated segments being contained inthe repertoire, receiving an inputted set performance segment, andidentifying the performance segment as the range.

The method may include, based on the repertoire starting from aperformance of the performance partner, playing, by the performancepartner, a part of the repertoire prior to the performance of theperformer based on a set playing speed.

The method may include, based on the repertoire transitioning from asolo part of the performer to a performance part of the performancepartner: based on a performance speed of the performer changing,starting to play, by the performance partner, the repertoire based on aperformance speed at an end of the solo part; and based on theperformance speed of the performer staying constant, starting to play,by the performance partner, the repertoire based on a set playing speed.

The performance partner ends the playing of the repertoire based on thecurrent digital score not being matched successfully within a first settime.

The converting the piece of current audio frame data collected into thecurrent digital score may include: processing the piece of current audioframe data collected using a pre-trained neural network model, andoutputting the current digital score corresponding to the piece ofcurrent audio frame data collected.

The current digital score may be represented using a binary saliencymap, and the pre-trained neural network model may be trained using abinary classification cross entropy loss function.

The matching may be implemented using a neural-network processor.

The method may include outputting a score of the repertoire and theposition determined by the positioning.

The method may include determining a current scene based on the positiondetermined by the positioning, and synthesizing, corresponding to thecurrent scene, a virtual performance animation corresponding to thecurrent scene using an avatar pre-selected by the performer.

Based on there being a plurality of performance users, the performer maybe a preset performance user among the plurality of performance users;and based on the matching being unsuccessful within a preset time, theperformer may be switched to a next preset performance user among theplurality of performance users.

Based on there being a plurality of performance users, an avatarpre-selected by each user may be stored; based on the virtualperformance animation being displayed, a virtual performance animationsynthesized using an avatar pre-selected by a current performer may bedisplayed, and based on the performer being switched, the virtualperformance animation may be switched to a virtual performance animationsynthesized using an avatar pre-selected by a performer switched to; or,avatars pre-selected by all the performance users are displayedsimultaneously, and a desired virtual performance animation may besynthesized.

The synthesizing, corresponding to the current scene, the virtualperformance animation corresponding to the current scene using an avatarpre-selected by the performer may include: pre-setting an animationswitching position in the repertoire, and based on a performanceprogress of the repertoire by the performance partner reaching theanimation switching position, changing the virtual performanceanimation; and/or, based on the current digital score not being matchedsuccessfully and/or the performance error corresponding to the currentdigital score being greater than a set threshold, changing an avatarpreset by the performer into a preset action, and synthesizing thevirtual performance animation.

The animation switching position may be set based on an input of aperformance user, or the animation switching position may be containedin the repertoire.

The animation switching position may be a position of switching betweendifferent musical instruments within the cooperation part in therepertoire, wherein the changing the virtual performance animation mayinclude displaying a virtual performance animation preset correspondingto a performance of a musical instrument switched to corresponding tothe switching position between the different musical instruments.

In accordance with an aspect of the disclosure, an apparatus forimplementing a virtual performance partner includes: a processorconfigured to: collect audio frame data performed by a performer;convert, for each piece of current audio frame data collected, the pieceof current audio frame data collected into a current digital score,match the current digital score with a range of digital scores in arepertoire, and determine a matching digital score in the range ofdigital scores that matches the current digital score; position, foreach piece of the current audio frame data collected, a position of thematching digital score in the repertoire, and determine a start time ofplaying a cooperation part of music in a next bar of the matchingdigital score in the repertoire for a performance partner; anddetermine, for each piece of the current audio frame data collected, aperformance error between the performer and the performance partnerbased on a performance time of the current digital score and aperformance time of the matching digital score, and adjust a playingspeed of the performance partner for the cooperation part in therepertoire based on the performance error.

In accordance with an aspect of the disclosure, a method for providing avirtual performance partner includes: receiving a current digital scorecorresponding to audio frame data; matching the current digital scorewith a range of digital scores in a repertoire; determining a matchingdigital score in the range of digital scores based on matching thecurrent digital score; identifying a position of the matching digitalscore in the repertoire, and identifying a start time of playing acooperation part of music in a next bar of the matching digital score inthe repertoire for a performance partner; determining a performanceerror between the performer and the performance partner based on aperformance time of the current digital score and a performance time ofthe matching digital score; and adjusting a playing speed of theperformance partner for the cooperation part in the repertoire based onthe performance error.

The method may include determining, based on a position of the currentdigital score, the range within which a next audio frame may be matched.

The method may include, based on the repertoire starting from aperformance of the performance partner, playing, by the performancepartner, a part of the repertoire prior to the performance of theperformer based on a set playing speed.

The performance partner may end the playing of the repertoire based onthe current digital score not being matched successfully within a firstset time.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features, and advantages of certainembodiments of the present disclosure will be more apparent from thefollowing description taken in conjunction with the accompanyingdrawings, in which:

FIG. 1 is a schematic diagram of a basic flow of a method forimplementing a virtual performance partner, according to an embodiment;

FIG. 2 is a schematic diagram of a system architecture, according to anembodiment;

FIG. 3 is a schematic diagram of a flow of a method for implementing avirtual performance partner, according to an embodiment;

FIG. 4 is a schematic diagram of training a neural network model,according to an embodiment; and

FIG. 5 is a schematic diagram of a basic structure of an apparatus forimplementing a virtual performance partner, according to an embodiment.

DETAILED DESCRIPTION OF THE INVENTION

The embodiments described below do not represent all technical aspectsof the disclosure. It should be understood that various equivalents orvariations that may be substituted for them at the time of the presentapplication belong to the scope of rights of the disclosure.

If a detailed description for the functions or configurations related tothe present disclosure may unnecessarily obscure the gist of the presentdisclosure, the detailed description may be omitted. In addition, thefollowing embodiments may be modified in several different forms, andthe scope and spirit of the disclosure are not limited to the followingembodiments. Rather, these embodiments are provided to make the presentdisclosure thorough and complete, and to completely transfer the spiritof the present disclosure to those skilled in the art.

However, it is to be understood that technologies mentioned in thepresent disclosure are not limited to specific embodiments, and includeall modifications, equivalents and/or substitutions according toembodiments of the present disclosure. Throughout the accompanyingdrawings, similar components are denoted by similar reference numerals.

The expressions “first,” “second” and the like, used in the presentdisclosure may indicate various components regardless of a sequenceand/or importance of the components. These expressions are only used inorder to distinguish one component from the other components, and do notlimit the corresponding components.

In the present disclosure, the expression “A or B,” “at least one of Aand/or B” or “one or more of A and/or B” or the like, may include allpossible combinations of items enumerated together. For example, “A orB,” “at least one of A and B,” or “at least one of A or B” may indicateall of 1) a case where at least one A is included, 2) a case where atleast one B is included, or 3) a case where both of at least one A andat least one B are included.

A term of a singular form may include its plural forms unless thecontext clearly indicates otherwise. It is to be understood that a term“include” or “formed of” used in the specification specifies thepresence of features, numerals, steps, operations, components, parts orcombinations thereof, which is mentioned in the specification, and doesnot preclude the presence or addition of one or more other features,numerals, steps, operations, components, parts or combinations thereof.

In case that any component (for example, a first component) is mentionedto be “(operatively or communicatively) coupled with/to” or “connectedto” another component (for example, a second component), it is to beunderstood that the any component is directly coupled to the anothercomponent or may be coupled to the another component through othercomponent (for example, a third component). On the other hand, in casethat any component (for example, the first component) is mentioned to be“directly coupled” or “directly connected to” another component (forexample, the second component), it is to be understood that the othercomponent (for example, the third component) is not present between anycomponent and another component.

An expression “configured (or set) to” used in the present disclosuremay be replaced by an expression “suitable for,” “having the capacityto,” “designed to,” “adapted to,” “made to” or “capable of” based on asituation. A term “configured (or set) to” may not necessarily indicate“specifically designed to” in hardware. Instead, an expression “anapparatus configured to” may indicate that the apparatus may “perform—”together with other apparatuses or components. For example, “a processorconfigured (or set) to perform A, B, and C” may indicate a dedicatedprocessor (for example, an embedded processor) for performing thecorresponding operations or a generic-purpose processor (for example, acentral processing unit (CPU) or an application processor) which mayperform the corresponding operations by executing one or more softwareprograms stored in a memory apparatus.

In the performance of musical instruments, a cooperator should followthe performance of a performer. Since music is sensitive to the tempo, aslight delay may cause the discomfort of hearing. Based on this, thedisclosure provides a method and apparatus for implementing a virtualperformance partner, to be able to adaptively adjust the content andspeed of a repertoire played in a player according to the audio of theperformer and especially be able to make adjustment according to thetempo of the performer.

Specifically, a performance partner is a virtual device that controlsthe player to play a piece of music, and a performer is a user who usesthe performance partner to accompany the performance of his own musicalinstrument. The performer performs an actual performance of certainmusic A. The player plays a specified part of the corresponding music A,which is usually an accompaniment part for the music A performed by acertain musical instrument. Hereinafter, the entire music performed isreferred to as a repertoire. A complete score of the repertoire and acorresponding audio file to be played by the performance partner may bepre-stored, e.g., in a server. The performance partner acquires,according to a request for the music from a performance user, the storedscore of the repertoire and the audio file to be played by theperformance partner. The complete score includes a score of a part to beperformed by the performer and a score corresponding to the audio fileto be played by the performance partner (hereinafter referred to as ascore of a cooperation part). In addition, in order to provide moreoptions, it is possible to store a plurality of audio files to be playedby the performance partner corresponding to one repertoire, e.g., audiofiles corresponding to different musical instruments. Therefore, theperformance user may be provided with versions of different musicalinstrument accompaniments for a same repertoire.

FIG. 1 is a schematic diagram of a basic flow of a method forimplementing a virtual performance partner in the disclosure, accordingto an embodiment. As shown in FIG. 1 , the method includes the followingsteps (e.g., operations).

In operation 101, audio frame data performed by a performer iscollected.

For each piece of current audio frame data collected in operation 101,the following operations 102-104 are executed.

In operation 102, the piece of current audio frame data collected isconverted into a current digital score, the current digital score ismatched with a range of digital scores in a repertoire, and a matchingdigital score in the range of digital scores that matches the currentdigital score is determined.

Operation 102 is used to perform the operations of score recognizing andscore matching. The collected audio frame data is first converted intodigital scores. A digital score into which the piece of current audioframe data collected is converted is referred to as the current digitalscore. The audio data of the repertoire is pre-converted into a digitalscore, and the current digital score converted is matched with theentire repertoire or the range of digital scores in the repertoire. Apart of digital score in the repertoire that successfully matches thecurrent digital score is referred to as the matching digital score. Itcan be seen therefrom that each piece of current audio frame datacollected is converted into a current digital score and a correspondingmatching digital score is determined. When matching of digital scores isperformed, the digital score of the entire repertoire matched with thecurrent digital score should be the score of the part performed by theperformer.

In operation 103, a position of the matching digital score is positionedin the repertoire, and a start time of playing a cooperation part ofmusic in a next bar of the matching digital score in the repertoire isdetermined for a performance partner.

Operation 103 is used to perform score positioning. For the matchingdigital score determined in operation 102, the position of the matchingdigital score in the entire repertoire, i.e., the position of a partcurrently being performed by the performer in the entire music, isdetermined. It is determined, using the position information, when theperformance partner starts to play the music content in the next bar ofthe positioned position. The method of determining the start time ofplaying the next bar is described in detail in the followingembodiments.

If the performance partner supports displaying the score it isperforming, it is also possible to indicate the positioned position inthe score it is performing according to the positioned position.

In operation 104, a performance error between the performer and theperformance partner is determined based on a performance time of thecurrent digital score and a performance time of the matching digitalscore, and a playing speed of the performance partner for thecooperation part in the repertoire is adjusted based on the performanceerror.

Operation 104 is used to process tempo tracking. Specifically, an errorbetween the performance of the performer and the play of the performancepartner is determined, so that tempo tracking is performed to adjust theplaying speed of the performance partner for the repertoire.

Through the above flow, it is possible to recognize the audio beingperformed by the performer and position the part being performed in themusic, so as to enable the performer to control the playing progress ofthe player at each bar of the music. Tempo tracking is performed usingthe performance error between the performer and the performance partner,to further adjust the playing speed of the player.

As shown in the flow of FIG. 1 , operations 102-104 should be processedfor each audio frame, according to an embodiment. Therefore, theprocessing should be fast. Hardware such as NPU (Neural-networkProcessing Unit) (e.g., neural-network processor) may be used as asupport to implement the above method. The performance partner analyzesthe audio and matches the scores in real time, which falls within ahigh-energy consumption calculation scene. The issues such as power maybe ignored when the operations are deployed on a desktop device. Inaddition, a sufficient volume may be useful for the performance partnerto cooperate with the musical instrument. Therefore, a television may beused as a suitable deployment platform to realize a specific method ofthe disclosure.

A specific implementation of the method for implementing a virtualperformance partner in the disclosure will be described below with anembodiment. In the present embodiment, a television is selected as adeployment platform of the corresponding method. The entire systemarchitecture, as shown in FIG. 2 , includes a television, a microphoneconnected to the television, and a sound box connected to thetelevision. FIG. 3 is a schematic diagram of a specific flow of themethod for implementing a virtual performance partner in the presentembodiment. The flow is explained by taking the processing of an audioframe as an example. As shown in FIG. 3 , the processing of an audioframe includes the following steps, according to an embodiment.

In operation 301, current audio frame data of a performer is acquired.

The processing of the operation 301 may be performed in a variety ofexisting ways. For example, a microphone may be plugged into atelevision to collect the audio of the performer.

In operation 302, the current audio frame data is converted into acurrent digital score using a pre-trained neural network model.

Operation 302 is used for score recognizing, i.e., converting the audiocollected by the microphone into a digital score that can besubsequently processed. The digital score may be represented in variousexisting ways, such as a binary saliency map. Specifically, the pitch ofmusic includes 88 keys from great A2 to small c5 at intervals ofsemitone. Based on this, a digital score may be represented as atwo-dimensional matrix with an X axis representing time coordinates anda Y axis representing pitch coordinates, thereby generating a binarysaliency map.

In the processing of operation 302, the current audio frame data isinputted into the trained neural network model, and after it isprocessed by the model, the current digital score corresponding to thecurrent audio frame data is outputted. Of course, a neural network modelfor converting the audio data into the digital score should bepre-trained before the entire flow shown in FIG. 3 is started. Thetraining of the neural network model is briefly described below.

The input of the neural network model is the audio frame data collectedby the microphone, and the output is the corresponding digital scorewhich is specifically a binary saliency map in the present embodiment.In view of the particularity of a musical instrument performance andtonal differences between different musical instruments, a correspondingneural network model may be trained for each musical instrument.

To train the neural network, training data may be prepared in advance,including a series of audio data (collected from a process of scoreperformance by a corresponding musical instrument) and digital scorescorresponding to the audio data. A manner of acquiring the digitalscores corresponding to the audio data is: knowing a score correspondingto audio data A, i.e., a score of the performed content, e.g., a staff,and representing the score in the form of a digital score, e.g.,representing the corresponding score with a binary saliency map, wherethe digital score corresponds to audio data A. In the training of theneural network, as shown in FIG. 4 , audio data and a digital scorecorresponding thereto constitute paired training data, the audio data isinputted into the neural network model to obtain an output of the model,the output is compared with the digital score corresponding to the audiodata to calculate a loss function, so as to update model parametersaccordingly, and then, next audio data for the training is inputted intothe model for processing until the model training is completed. In thepresent embodiment, the digital score is represented using the binarysaliency map. Since the binary saliency map is labels of binaryclassification, the neural network model may be trained, using a binaryclassification cross entropy loss function.

In operation 303, the current digital score is matched with a specifiedrange of digital scores in a repertoire, and a matching digital score inthe specified range of digital scores that matches the current digitalscore is determined, according to an embodiment.

Operation 303 is used to perform score matching, according to anembodiment. In order to perform the score matching, it may be necessaryto acquire the digital score of the repertoire which may be pre-storedin a database. The current digital score is compared with the completedigital score or the specified partial digital score of the repertoire,to perform window search and find the most similar part to the currentdigital score, and the most similar part is referred to as the matchingdigital score. As previously described, the digital score of therepertoire for comparison may be a complete digital score or a specifiedpartial digital score. Here, the specified partial digital score may bespecified by a user, or the content currently being performed may betargeted within a certain region of the repertoire after a plurality ofaudio frames have been processed, and the region is taken as thespecified range. When digital score matching is performed, the digitalscore of the part for the performer in the repertoire should be selectedto be matched with the current digital score.

Matching may be realized using various existing searching and matchingalgorithms, or may be realized by pre-training the network model. Inaddition, in order to accelerate the calculation and share CPU usage,the above matching process may be performed by an NPU.

In operation 304, a position of the matching digital score is positionedin the repertoire, and a start time of playing a cooperation part ofmusic in a next bar of the matching digital score in the repertoire isdetermined for a performance partner.

As previously described, the implementation of the virtual performancepartner is actually controlling the player (i.e., a television in thisembodiment) to play the set playing content. The set playing content,e.g., violin accompaniment audio for a certain repertoire, may be presetby the user.

Operation 304 is used to first perform score positioning. The positionof the matching digital score matched in the repertoire in operation 303is determined. The position is a part currently being performed by theperformer. A performance speed of the performer, i.e., an averageperformance speed within N+1 audio frame times, is calculated accordingto the position of the matching digital score and the positions ofmatching digital scores corresponding to first N pieces of audio framedata in the current audio frame data. The performance speed is used as areference playing speed of the player playing the repertoire. Next, astart time of playing a next bar of music of the matching digital scorein the repertoire is determined for the player based on the referenceplaying speed. That is to say, a performance start time of a next barrelative to a bar where the current audio frame is located is calculatedbased on the above reference playing speed. The player determines theperformance start time as a start time of playing the next bar of music,so that the performance of the performer and the playing of the playerare synchronized at an initial position of the next bar.

In addition, when performing the score matching, the matching may beperformed within a specified range of the music. A specific position ofthe specified range in the processing of the next audio frame data maybe determined according to the positioned position based on thepositioned result of the present step.

The above processing for score positioning is the most basic processingmanner, which is referred to herein as a random positioning manner, andis used when processing an ordinary audio frame, which may include whena performer enters music for the first time and when the performerstarts a performance from any position. On this basis, a music theorypositioning manner may be further included, which refers to processingand playing a cooperation part according to information marked in ascore. The information marked in the score may be, e.g., music startingfrom a solo part, playing a segment at a free speed to cause difficultyin tracking, containing repeated segments in a composition, musicstarting from a band part, music changing from the solo part to the bandpart, etc.

For the above random positioning manner and music theory positioningmanner, corresponding processing may be used to determine a start timeof playing a next bar of cooperation repertoire in different situations,and the foregoing determination manner is simply referred to as a randompositioning algorithm. The processing allocation of random positioningand music theory positioning may be performed as follows:

When the performer enters music for the first time, when the performerstarts a performance from any position, when the music starts from asolo part, and when a segment is performed at a free speed to causedifficulty in tracking, the above random positioning algorithm may beused for processing to determine a reference playing speed and a starttime of playing a next bar.

When repeated segments are contained in the repertoire, an inputted setperformance segment is received, the performance segment is used as thespecified range, and the random positioning algorithm is executed todetermine a reference playing speed and a start time of playing a nextbar.

When the repertoire starts from the performance of the performancepartner, the performance partner plays a part of the repertoire prior tothe performance of the performer according to a set playing speed. Theset playing speed may be a default playing speed.

When the repertoire transitions from a solo part of the performer to aperformance part of the performance partner, the performance partnerstarts to play the repertoire according to a performance speed at theend of the solo part if a performance speed of the performer changes; orotherwise, the performance partner starts to play the repertoireaccording to a set playing speed.

In operation 305, a performance error between the performer and theperformance partner is determined according to a performance time of thecurrent digital score and a performance time of the matching digitalscore, and a playing speed of the performance partner for thecooperation part in the repertoire is adjusted according to theperformance error.

The present operation 305 is used to perform tempo tracking and adjustan actual playing speed within a bar.

First, it may be necessary to determine the performance error betweenthe performer and the performance partner. As previously described, thedigital score includes pitches and durations of respective tones in thescore. The durations are referred to as the performance time of thedigital score. A difference between the performance time of the currentdigital score and the performance time of the matching digital score isthe performance error.

The manner of adjusting the playing speed according to the performanceerror may specifically include the following steps. When the performanceerror is less than one beat, on the basis of the reference playing speeddetermined in operation 305, the playing speed within the current bar isadjusted according to the performance error, so that the performancepartner is consistent with the performer in a performance end time of acurrent bar of music. In this way, the performance speed can be adjustedwithin the current bar, and the performance speed of the performer canbe caught up within the current bar. The processing cooperates with theprocessing of operation 304 to ensure that the start times of playing anext bar by the performer and the performance partner are consistent.

When the performance error is greater than one beat, the performancepartner pauses playing at the current bar and plays a next bar of musicaccording to a playing time of the next bar of music. Since thenon-synchronization in tempo is easily perceived, if the performanceerror is excessive, the performance of the performance partner is pausedat the current bar, i.e., the playing of the player is paused, and thecooperation part of a next bar of score is played starting from the nextbar. The processing also cooperates with the processing of operation304.

In addition, the performance partner ends the playing of the repertoirewhen the current digital score is not matched successfully within afirst set time (e.g., 5 seconds).

If the performer skips a bar and pauses performance for less than thefirst set time, the performance of the performance partner is paused ata current bar, i.e., the playing of the player is paused, and thecooperation part of a next bar of score is played starting from the nextbar. The processing also cooperates with the processing of operation304.

If the performance is ended or the performance is interrupted (i.e.,corresponding audio frame data cannot be collected), the performancepartner ends the playing of the repertoire.

Thus, the method flow in the present embodiment is ended, according toan embodiment.

On the basis of the above method, since the television also has adisplay function, the following processing may also be further includedto improve the user experience:

Information about a performance score and a current performance positionis displayed and/or outputted. Here, the current performance positiondetermined by positioning may be displayed in real time according to thepositioned result of operation 304.

A user is allowed to select an avatar of the user by setting, and avirtual performance animation synthesized from the avatar is displayedin real time according to a positioned result of the score. The avatarof the user may include a static image and a dynamic image. The staticimage refers to fixed materials such as a portrait, clothing,decoration, musical instruments, and stage scenery of a virtualcharacter. The dynamic image refers to an animation action synthesizedin real time by the television when the user performs, such as acharacter action and a camera movement. The preset animation content maybe displayed according to different scenes determined by positioning thescore.

Animation switching positions may be preset in the repertoire. When aperformance progress of the repertoire by the performance partnerreaches a certain animation switching position, the displaying of avirtual performance animation is changed. The virtual performanceanimation content switched to may be pre-designed. For example, theanimation switching position may be set as: a position of switchingbetween different musical instruments within the cooperation part in therepertoire. Accordingly, when the performance proceeds to the animationswitching position (i.e., the position of switching between differentmusical instruments), a virtual performance animation set in advancecorresponding to the performance of the musical instrument switched tois displayed. The animation switching position may be set according tothe input of the performance user before the performance starts, or theanimation switching position may also be contained in the repertoire andalready set when a file is initially established.

When it is detected that the volume of a device where the performancepartner is located changes, the action amplitude of each avatar in thevirtual animation may change according to the volume change.

When the current digital score is not matched successfully and/or aproblem occurs in the tempo of the performer (e.g., a performance errorcorresponding to the current digital score may be greater than a setthreshold), it is also possible to transform the avatar preset by theperformer into a preset action and synthesize a corresponding virtualperformance animation, so that when the digital score fails to bematched and/or a problem occurs in the tempo, an animation charactercorresponding to the performer may have a corresponding performance.

An example of corresponding virtual performance animations displayed indifferent scenes is given below, as shown in Table 1.

TABLE 1 Scene Animation (dynamic image) 1. Wait for A performer is inposition and waves performance 2. Band scene 1 A shot shows allmusicians 3. Band scene 2 A shot shows the band from various angles 4.Soloist is The soloist and the band are in eye contact to ready to enterindicate that they are ready 5. Soloist A shot focuses on the soloistperformance 6. New part A shot focuses on people for 2-3 seconds entersmusic 7. Volume The action amplitude of performance is change adjustedwith the volume 8. Tempo A shot focuses on a person with the greatestchange change amplitude for 2-3 seconds 9. End of The band puts downmusical instruments in performance greeting, and then the animation isended 10. Performance The band grabs musical instruments andinterruption waits, and the animation ends if the waiting time exceeds 5seconds

In addition, the above entire processing of the performance partner isperformed for a specific performer. In fact, both a single-usersituation and a multi-user situation may also be set in specificimplementation. This is referred to herein as performance tracking,specifically including single-user tracking and multi-user tracking. Ina single-user scene, after a user completes basic settings and startsperformance, the performance partner always follows the set user. In amulti-user scene, assuming that there are users A, B and C, the threeusers should perform simultaneously according to the normal musicalcooperation. In this case, the performance partner follows the set userA to perform a cooperation part. If in the tempo tracking part, user Afails to be tracked for more than a second set time (e.g., 2 seconds),the tracked object is switched to another user, and so on.

If there are a plurality of performance users, each user may pre-selecta corresponding avatar. When a virtual performance animation isdisplayed, only an avatar corresponding to a current performer (i.e., auser being followed by the performance partner) may be displayed, and acorresponding virtual performance animation is synthesized. When theperformer is switched, the virtual performance animation is switched toan avatar corresponding to the performer switched to, and acorresponding virtual performance animation is synthesized. When avirtual performance animation is displayed, it is also possible tosimultaneously display avatars of all performers and synthesizecorresponding virtual performance animations.

The above is the specific implementation of the method for implementinga virtual performance partner in the disclosure. By means of the aboveprocessing in the disclosure, the performance partner can performtracking and playing according to audio performed by the performer,especially tracking in tempo, thereby adapting the performance of themusic to the performance progress of the performer and improving theperformance experience of the performer.

The disclosure also provides an apparatus for implementing a virtualperformance partner. As shown in FIG. 5 , the apparatus includes aprocessor for implementing: a collector 510, a score recognizer andmatcher 520, a score positioner 530, and a tempo tracker 540.

The collector is configured to collect audio frame data performed by aperformer in audio frames.

The score recognizer and matcher is configured to convert, for eachpiece of the current audio frame data collected, the piece of currentaudio frame data collected into a current digital score, match thecurrent digital score with a specified range of digital scores in arepertoire, and determine a matching digital score in the specifiedrange of digital scores that matches the current digital score.

The score positioner is configured to position, for each piece of thecurrent audio frame data collected, a position of the matching digitalscore in the repertoire, and determine a start time of playing acooperation part of music in a next bar of the matching digital score inthe repertoire for a performance partner.

The tempo tracker is configured to determine, for each piece of thecollected current audio frame data collected, a performance errorbetween the performer and the performance partner according to aperformance time of the current digital score and a performance time ofthe matching digital score, and adjust a playing speed of theperformance partner for the cooperation part in the repertoire accordingto the performance error.

With the above method and apparatus of the disclosure, a virtualperformance partner may be implemented. A non-limiting example isprovided below:

The system settings are shown in Table 2. Under the system settingsshown in the example, a user may select a composition to be performedthrough the settings, and set parts, a musical instrument(s) used by aperformer(s), and a virtual animation image(s) of the performer(s).According to the user settings, a television acquires a digital scorecorresponding to the composition set by the user from a score libraryusing the cloud service, and acquires a neural network model of themusical instrument set by the user from a sound library for performingaudio-to-digital score conversion. The television collects audio datagenerated by the performer performing with the selected musicalinstrument through a microphone connected to the television. Thetelevision performs score recognition, converts the collected audio datainto a digital score, positions a position of the current music aftermatching, and plays a cooperation part of the set music synchronouslywith the performance of the performer. The television synthesizes avirtual performance animation in real time for output according to thepositioned position, and outputs a score and the position positioned inthe score.

TABLE 2 Module Description Cloud Score library Store public orcopyrighted digital scores service Sound library Store audio conversionmodels of various musical instruments Setting Composition Audio file ofa composition included in score library Part Part to be performed byperformer Musical Musical instrument used by performer instrument ImageFace, dress, etc. of virtual character Input Microphone Collectperformed music using input internal/external microphone TelevisionScore Convert audio input into score recognizing information (i.e.,digital score) Score Window matching of current score matching segmentin global/partial score Score Position current music in scorepositioning Tempo Infer tempo of performer tracking for synchronousplaying Animation Synthesize virtual performance generating animation inreal time Output Score Display position of current music in score SoundSynchronously play cooperation part of music Animation Generate, playand store virtual animation in real time

In an embodiment, a method for providing a virtual performance partnermay comprise: collecting audio frame data performed by a performer.

In an embodiment, the method may further comprise: for each piece ofcurrent audio frame data collected, converting the piece of currentaudio frame data collected into a current digital score.

In an embodiment, the method may further comprise: matching the currentdigital score with a range of digital scores in a repertoire.

In an embodiment, the method may further comprise: determining amatching digital score in the range of digital scores that matches thecurrent digital score.

In an embodiment, the method may further comprise: positioning aposition of the matching digital score in the repertoire.

In an embodiment, the method may further comprise: determining a starttime of playing a cooperation part of music in a next bar of thematching digital score in the repertoire for a performance partner.

In an embodiment, the method may further comprise: determining aperformance error between the performer and the performance partnerbased on a performance time of the current digital score and aperformance time of the matching digital score.

In an embodiment, the method may further comprise: adjusting a playingspeed of the performance partner for the cooperation part in therepertoire based on the performance error.

In an embodiment, the method may further comprise: determining, based ona position of the current digital score, the range within which a nextaudio frame is matched.

In an embodiment, the determining the start time of playing thecooperation part of music in the next bar of the matching digital scorein the repertoire for the performance partner may comprise: determininga performance speed of the performer based on the position of thematching digital score and positions of matching digital scorescorresponding to first N pieces of audio frame data in the current audioframe data; identifying the performance speed as a reference playingspeed of the repertoire; and determining a start time of playing a nextbar of music of the matching digital score in the repertoire for theperformance partner based on the reference playing speed.

In an embodiment, the adjusting the playing speed of the performancepartner for the repertoire based on the performance error may comprise:based on the performance error being less than one beat, adjusting theplaying speed of the performance partner within a current bar of musicaccording to the performance error based on the reference playing speed,to make the performance partner consistent with the performer in aperformance end time of the current bar of music; and based on theperformance error being greater than one beat, pausing playing, by theperformance partner, at the current bar, and playing a next bar of musicbased on a playing time of the next bar of music.

In an embodiment, the method may further comprise: based on repeatedsegments being contained in the repertoire, receiving an inputted setperformance segment, and identifying the performance segment as therange.

In an embodiment, the method may further comprise: based on therepertoire starting from a performance of the performance partner,playing, by the performance partner, a part of the repertoire prior tothe performance of the performer based on a set playing speed.

In an embodiment, the method may further comprise: based on therepertoire transitioning from a solo part of the performer to aperformance part of the performance partner; based on a performancespeed of the performer changing, starting to play, by the performancepartner, the repertoire based on a performance speed at an end of thesolo part; and based on the performance speed of the performer stayingconstant, starting to play, by the performance partner, the repertoireaccording to a set playing speed.

In an embodiment, the performance partner may end the playing of therepertoire based on the current digital score not being matchedsuccessfully within a first set time.

In an embodiment, the converting the piece of current audio frame datacollected into the current digital score may comprise: processing thepiece of current audio frame data collected using a pre-trained neuralnetwork model; and outputting the current digital score corresponding tothe piece of current audio frame data collected.

In an embodiment, the current digital score may be represented using abinary saliency map, and the pre-trained neural network model is trainedusing a binary classification cross entropy loss function.

In an embodiment, the matching is implemented using a neural-networkprocessor.

In an embodiment, the method may further comprise: outputting a score ofthe repertoire and the position determined by the positioning.

In an embodiment, the method may further comprise: determining a currentscene based on the position determined by the positioning; andsynthesizing, corresponding to the current scene, a virtual performanceanimation corresponding to the current scene using an avatarpre-selected by the performer.

In an embodiment, based on there being a plurality of performance users,the performer may be a preset performance user among the plurality ofperformance users; and based on the matching being unsuccessful within apreset time, the performer may be switched to a next preset performanceuser among the plurality of performance users.

In an embodiment, based on there being a plurality of performance users,an avatar pre-selected by each user may be stored; and based on thevirtual performance animation being displayed, a virtual performanceanimation synthesized using an avatar pre-selected by a currentperformer may be displayed, and based on the performer being switched,the virtual performance animation may be switched to a virtualperformance animation synthesized using an avatar pre-selected by aperformer switched to; or, avatars pre-selected by all the performanceusers may be displayed simultaneously, and a desired virtual performanceanimation may be synthesized.

In an embodiment, the synthesizing, corresponding to the current scene,the virtual performance animation corresponding to the current sceneusing the avatar pre-selected by the performer may comprise: pre-settingan animation switching position in the repertoire, and based on aperformance progress of the repertoire by the performance partnerreaching the animation switching position, changing the virtualperformance animation; and/or based on the current digital score notbeing matched successfully and/or the performance error corresponding tothe current digital score being greater than a set threshold, changingan avatar preset by the performer into a preset action, and synthesizingthe virtual performance animation.

In an embodiment, the animation switching position may be set based onan input of a performance user, or the animation switching position iscontained in the repertoire.

In an embodiment, the animation switching position may be a position ofswitching between different musical instruments within the cooperationpart in the repertoire, and the changing the virtual performanceanimation may comprise displaying a virtual performance animation presetcorresponding to a performance of a musical instrument switched tocorresponding to the switching position between the different musicalinstruments.

In an embodiment, an apparatus for implementing a virtual performancepartner, may comprises: one or more processors configured to: collectaudio frame data performed by a performer.

In an embodiment, the one or more processors may be further configuredto: convert, for each piece of current audio frame data collected, thepiece of current audio frame data collected into a current digitalscore.

In an embodiment, the one or more processors may be further configuredto: match the current digital score with a range of digital scores in arepertoire.

In an embodiment, the one or more processors may be further configuredto: determine a matching digital score in the range of digital scoresthat matches the current digital score.

In an embodiment, the one or more processors may be further configuredto: position, for each piece of the current audio frame data collected,a position of the matching digital score in the repertoire.

In an embodiment, the one or more processors may be further configuredto: determine a start time of playing a cooperation part of music in anext bar of the matching digital score in the repertoire for aperformance partner.

In an embodiment, the one or more processors may be further configuredto: determine, for each piece of the current audio frame data collected,a performance error between the performer and the performance partnerbased on a performance time of the current digital score and aperformance time of the matching digital score.

In an embodiment, the one or more processors may be further configuredto: adjust a playing speed of the performance partner for thecooperation part in the repertoire based on the performance error.

In an embodiment, a method for providing a virtual performance partner,may comprise: receiving a current digital score corresponding to audioframe data.

In an embodiment, the method may further comprise: matching the currentdigital score with a range of digital scores in a repertoire.

In an embodiment, the method may further comprise: determining amatching digital score in the range of digital scores based on matchingthe current digital score.

In an embodiment, the method may further comprise: identifying aposition of the matching digital score in the repertoire.

In an embodiment, the method may further comprise: identifying a starttime of playing a cooperation part of music in a next bar of thematching digital score in the repertoire for a performance partner.

In an embodiment, the method may further comprise: determining aperformance error between the performer and the performance partnerbased on a performance time of the current digital score and aperformance time of the matching digital score.

In an embodiment, the method may further comprise: adjusting a playingspeed of the performance partner for the cooperation part in therepertoire based on the performance error.

In an embodiment, the method may further comprise: determining, based ona position of the current digital score, the range within which a nextaudio frame is matched.

In an embodiment, the method may further comprise: based on therepertoire starting from a performance of the performance partner,playing, by the performance partner, a part of the repertoire prior tothe performance of the performer based on a set playing speed.

In an embodiment, the performance partner may end the playing of therepertoire based on the current digital score not being matchedsuccessfully within a first set time.

The above descriptions are merely embodiments, and are not intended tolimit the disclosure. Any modification, equivalent substitution orimprovement made within the spirit and principles of the disclosureshould be included within the protection scope of the disclosure.

What is claimed is:
 1. A method for providing a virtual performancepartner, the method comprising: collecting audio frame data performed bya performer; and for each piece of current audio frame data collected,performing: converting the piece of current audio frame data collectedinto a current digital score; matching the current digital score with arange of digital scores in a repertoire; determining a matching digitalscore in the range of digital scores that matches the current digitalscore; positioning a position of the matching digital score in therepertoire; determining a start time of playing a cooperation part ofmusic in a next bar of the matching digital score in the repertoire fora performance partner; determining a performance error between theperformer and the performance partner based on a performance time of thecurrent digital score and a performance time of the matching digitalscore; and adjusting a playing speed of the performance partner for thecooperation part in the repertoire based on the performance error. 2.The method of claim 1, further comprising determining, based on aposition of the current digital score, the range within which a nextaudio frame is matched.
 3. The method of claim 1, wherein thedetermining the start time of playing the cooperation part of music inthe next bar of the matching digital score in the repertoire for theperformance partner comprises: determining a performance speed of theperformer based on the position of the matching digital score andpositions of matching digital scores corresponding to first N pieces ofaudio frame data in the current audio frame data; identifying theperformance speed as a reference playing speed of the repertoire; anddetermining a start time of playing a next bar of music of the matchingdigital score in the repertoire for the performance partner based on thereference playing speed.
 4. The method of claim 3, wherein the adjustingthe playing speed of the performance partner for the repertoire based onthe performance error comprises: based on the performance error beingless than one beat, adjusting the playing speed of the performancepartner within a current bar of music according to the performance errorbased on the reference playing speed, to make the performance partnerconsistent with the performer in a performance end time of the currentbar of music; and based on the performance error being greater than onebeat, pausing playing, by the performance partner, at the current bar,and playing a next bar of music based on a playing time of the next barof music.
 5. The method of claim 1, further comprising: based onrepeated segments being contained in the repertoire, receiving aninputted set performance segment, and identifying the performancesegment as the range.
 6. The method of claim 1, further comprising,based on the repertoire starting from a performance of the performancepartner, playing, by the performance partner, a part of the repertoireprior to the performance of the performer based on a set playing speed.7. The method of claim 1, further comprising: based on the repertoiretransitioning from a solo part of the performer to a performance part ofthe performance partner: based on a performance speed of the performerchanging, starting to play, by the performance partner, the repertoirebased on a performance speed at an end of the solo part; and based onthe performance speed of the performer staying constant, starting toplay, by the performance partner, the repertoire according to a setplaying speed.
 8. The method of claim 1, wherein the performance partnerends the playing of the repertoire based on the current digital scorenot being matched successfully within a first set time.
 9. The method ofclaim 1, wherein the converting the piece of current audio frame datacollected into the current digital score comprises: processing the pieceof current audio frame data collected using a pre-trained neural networkmodel; and outputting the current digital score corresponding to thepiece of current audio frame data collected, and wherein the currentdigital score is represented using a binary saliency map, and thepre-trained neural network model is trained using a binaryclassification cross entropy loss function.
 10. The method of claim 1,wherein the matching is implemented using a neural-network processor.11. The method of claim 1, further comprising: determining a currentscene based on the position determined by the positioning; andsynthesizing, corresponding to the current scene, a virtual performanceanimation corresponding to the current scene using an avatarpre-selected by the performer.
 12. The method of claim 1, wherein basedon there being a plurality of performance users, the performer is apreset performance user among the plurality of performance users; andbased on the matching being unsuccessful within a preset time, theperformer is switched to a next preset performance user among theplurality of performance users.
 13. The method of claim 11, whereinbased on there being a plurality of performance users, an avatarpre-selected by each user is stored; and based on the virtualperformance animation being displayed, a virtual performance animationsynthesized using an avatar pre-selected by a current performer isdisplayed, and based on the performer being switched, the virtualperformance animation is switched to a virtual performance animationsynthesized using an avatar pre-selected by a performer switched to; or,avatars pre-selected by all the performance users are displayedsimultaneously, and a desired virtual performance animation issynthesized.
 14. The method of claim 11, wherein the synthesizing,corresponding to the current scene, the virtual performance animationcorresponding to the current scene using the avatar pre-selected by theperformer comprises: pre-setting an animation switching position in therepertoire, and based on a performance progress of the repertoire by theperformance partner reaching the animation switching position, changingthe virtual performance animation; and/or based on the current digitalscore not being matched successfully and/or the performance errorcorresponding to the current digital score being greater than a setthreshold, changing an avatar preset by the performer into a presetaction, and synthesizing the virtual performance animation.
 15. Themethod of claim 14, wherein the animation switching position is aposition of switching between different musical instruments within thecooperation part in the repertoire, and wherein the changing the virtualperformance animation comprises displaying a virtual performanceanimation preset corresponding to a performance of a musical instrumentswitched to corresponding to the switching position between thedifferent musical instruments.
 16. An apparatus for implementing avirtual performance partner, the apparatus comprising: one or moreprocessors configured to: collect audio frame data performed by aperformer; convert, for each piece of current audio frame datacollected, the piece of current audio frame data collected into acurrent digital score, match the current digital score with a range ofdigital scores in a repertoire, determine a matching digital score inthe range of digital scores that matches the current digital score,position, for each piece of the current audio frame data collected, aposition of the matching digital score in the repertoire, determine astart time of playing a cooperation part of music in a next bar of thematching digital score in the repertoire for a performance partner,determine, for each piece of the current audio frame data collected, aperformance error between the performer and the performance partnerbased on a performance time of the current digital score and aperformance time of the matching digital score, and adjust a playingspeed of the performance partner for the cooperation part in therepertoire based on the performance error.
 17. A method for providing avirtual performance partner, the method comprising: receiving a currentdigital score corresponding to audio frame data; matching the currentdigital score with a range of digital scores in a repertoire;determining a matching digital score in the range of digital scoresbased on matching the current digital score; identifying a position ofthe matching digital score in the repertoire; identifying a start timeof playing a cooperation part of music in a next bar of the matchingdigital score in the repertoire for a performance partner; determining aperformance error between the performer and the performance partnerbased on a performance time of the current digital score and aperformance time of the matching digital score; and adjusting a playingspeed of the performance partner for the cooperation part in therepertoire based on the performance error.
 18. The method of claim 17,further comprising determining, based on a position of the currentdigital score, the range within which a next audio frame is matched. 19.The method of claim 17, further comprising: based on the repertoirestarting from a performance of the performance partner, playing, by theperformance partner, a part of the repertoire prior to the performanceof the performer based on a set playing speed.
 20. The method of claim17, wherein the performance partner ends the playing of the repertoirebased on the current digital score not being matched successfully withina first set time.